Evaluating CHIRPS with Local Rainfall Data

This notebook evaluates the performance of CHIRPS rainfall data against local weather station observations in the Citarum Basin, Indonesia.
Author
Published

Friday, December 27, 2024

Modified

Tuesday, December 17, 2024

Abstract

WIP. Nulla eget cursus ipsum. Vivamus porttitor leo diam, sed volutpat lectus facilisis sit amet. Maecenas et pulvinar metus. Ut at dignissim tellus. In in tincidunt elit. Etiam vulputate lobortis arcu, vel faucibus leo lobortis ac. Aliquam erat volutpat. In interdum orci ac est euismod euismod. Nunc eleifend tristique risus, at lacinia odio commodo in. Sed aliquet ligula odio, sed tempor neque ultricies sit amet.

Keywords

CHIRPS, Citarum Watershed, Rainfall, Precipitation, Hydrology, Data Comparison, Data Analysis, Indonesia

Imagine two weather reporters, one in a satellite 🛰️ high above the Earth and one on the ground at a local weather station 📡. They’re both reporting on the same thing: rainfall 🌧️. The satellite reporter represents global, gridded rainfall datasets like CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data), which provide a broad, top-down view of rainfall patterns across vast regions. The ground reporter represents the network of local weather stations, collecting precise rainfall measurements at specific points on the Earth’s surface. Are these two reporters telling the same story about the rain? 🤔 How consistent are their reports? That’s what we’re going to explore in this notebook!

Essentially, we’ll be playing the role of fact-checkers, scrutinizing the rainfall data from both our “reporters.” We’ll use a variety of tools and techniques to analyze the data, create visualizations, and assess the reliability of each source. This will involve looking for trends, calculating statistics, and even comparing how they describe specific events, like heavy downpours. Understanding the strengths and weaknesses of both satellite-derived and ground-based rainfall data is crucial. It can help us improve hydrological models, inform water management strategies, and enhance our ability to predict and respond to extreme weather events, no matter where we are in the world. By the end of this notebook, we’ll have a clearer picture of how to interpret and utilize these different sources of rainfall information for a more comprehensive understanding of our planet’s precipitation patterns. Let’s get started!

About this notebook

This notebook provides an educational demonstration on analyzing and comparing rainfall data from different sources, with a focus on the process rather than being a definitive research paper. It’s open-source, so you’re welcome to use and adapt it. If you find any errors or have suggestions, please help improve this resource by creating an issue on the GitHub. Your input is greatly valued!

Work in Progress & Source Code

This notebook is currently under development. The complete source code and accompanying repository will be made publicly available once the draft is finalized. This early publication aims to share the initial findings and overall direction of the project. Stay tuned for updates!

Code
import geopandas as gpd
import pandas as pd
import plotly.graph_objects as go
import myfunc
import xarray as xr
import matplotlib.pyplot as plt
from IPython.display import display # noqa
import pytemplate # noqa
import seaborn as sns
import matplotlib.dates as mdates
from shapely.geometry import Polygon, Point
from scipy.spatial import Voronoi, KDTree
import calmap
from geopy.point import Point as geoPoint
import numpy as np
from pyproj import Transformer

1 Introduction

This chapter provides the foundational context for this notebook, outlining the critical role of accurate rainfall data in effective water resource management. It introduces the two primary data sources, CHIRPS and BBWS Citarum, and describes the Citarum River Basin as the study area.

1.1 Project Background and Objectives

Rainfall is a crucial element in managing water resources, especially in a region like the Citarum River Basin. Understanding how much rain falls, where it falls, and when it falls is essential for preventing floods, managing droughts, and ensuring a reliable water supply for communities and agriculture. This notebook focuses on comparing two different sources of rainfall data: one from a global satellite-based system called CHIRPS and another from local rain gauges operated by BBWS Citarum, which we consider as the ground truth. By examining how well these two datasets agree, we can gain valuable insights into the accuracy of the satellite data and its potential for improving water management practices in the region.

The main goal of this notebook is to see how well the CHIRPS rainfall data matches up with the measurements taken from rain gauges on the ground (BBWS Citarum). We want to find out if the satellite data is consistent with the ground truth, where they might differ, and what those differences might mean for understanding rainfall patterns in the Citarum River Basin. Ultimately, this comparison will help us determine if CHIRPS data can be a reliable tool for supporting water resource management decisions, especially in areas where ground-based measurements are limited.

1.2 Data Sources

This section will briefly introduce the two datasets we’re using in this project: CHIRPS and BBWS Citarum.

  • CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data): Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 35+ year quasi-global rainfall data set. Spanning 50°S-50°N (and all longitudes) and ranging from 1981 to near-present, CHIRPS incorporates their in-house climatology, CHPclim, 0.05° resolution satellite imagery, and in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring 1. It’s especially helpful in areas where there aren’t many weather stations on the ground.

  • BBWS Citarum (Balai Besar Wilayah Sungai Citarum): This organization is responsible for managing water resources within the Citarum River Basin. They collect rainfall data using a network of rain gauges located throughout the basin. These rain gauges provide direct measurements of rainfall at specific points, which we consider our “ground truth” data. However, it is only cover specific area within Citarum River Basin. We will select rainfall data from automatic rain gauges operated by BBWS Citarum in this notebook.

We will go into more detail about how we access and process the data from each source in the next chapter (Chapter 2: Data Acquisition and Preprocessing).

1.3 Study Area

In this notebook, we’ll be exploring rainfall data from a specific area in Indonesia called the Upper Citarum River Watershed (or “DAS Citarum Hulu” in Indonesian). Think of it as our area of interest for this project! This watershed is important for managing water in West Java. It’s a fairly large area, covering about 1,738 square kilometers – that’s a bit bigger than the size of London or New York City.

Figure 1: Upper Citarum Watershed

Geographically, the Upper Citarum River Watershed sits between 6°45’ and 7°15’ South latitude and 107°21’ and 107°57’ East longitude. Parts of several cities and regions fall within this watershed, including Bandung City, Cimahi City, Bandung Regency, and Sumedang Regency. You can see the location of the watershed on Figure 1.


2 Data Acquisition and Preprocessing

This chapter details the process of acquiring and preparing the rainfall data from our two sources: the satellite-based CHIRPS dataset and the ground-based measurements from BBWS Citarum rain gauges. We will outline the steps taken to download, clean, and align these datasets, ensuring they are compatible for a robust comparison within the Upper Citarum River Watershed for a defined time period. This meticulous preparation is crucial to ensure the accuracy and reliability of our subsequent analysis.

2.1 CHIRPS Data

Building upon our introduction, we now delve into the specifics of our data sources, beginning with the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS). In this section, we’ll detail how we obtained the CHIRPS data for our study area, the Upper Citarum River Watershed, from the ClimateSERV platform, a tool designed for visualizing and downloading historical and forecasted climate data. While CHIRPS data is available from various sources, we opted for ClimateSERV for its user-friendly interface, which produced the data in netCDF4 format.

2.1.1 Using ClimateSERV to Obtain CHIRPS Data

For this analysis, we’ll be using the ClimateSERV platform to obtain CHIRPS rainfall data specifically for the Upper Citarum River Watershed. ClimateSERV offers a user-friendly interface for downloading pre-processed climate data. Follow these steps to get the data:

Figure 2: CHIRPS Data Download from ClimateSERV
  1. Navigate to ClimateSERV: Open your web browser and go to https://climateserv.servirglobal.net/map.

  2. Set Area of Interest (AOI):

    • On the left panel, you’ll see “Statistical Query” and “Set Area of Interest.”
    • Click on the “Upload” tab under “Set Area of Interest.”
    • You can either drag and drop your shapefile (in .zip, .json, or .geojson format) representing the Upper Citarum River Watershed boundary or click to select the file from your computer. The map on the right will automatically zoom into the uploaded shapefile as shown on Figure 2. If you don’t have a shapefile, you can use the “Draw” option to manually draw a rectangle around the area, but this method is less precise. For this case we use boundary of upper citarum basin which highlited with blue line.
  3. Select Data Parameters:

    • Under “Select Data,” choose “Download Raw Data” for “Type of Request.”
    • Select “Observation” for “Dataset Type.”
    • Choose “UCSB CHIRPS Rainfall” as the “Data Source.”
    • Select “NetCDF” as the “Download Format.”
  4. Specify Date Range:

    • Set the “Date Range” according to your analysis period. For this example, let’s use 2007-01-01 as the start date and 2019-12-31 as the end date.
  5. Submit Query:

    • Click the “Submit Query” button. ClimateSERV will process your request. The downloaded file will be contain NetCDF file (.nc), containing CHIRPS rainfall data clipped to the Upper Citarum River Watershed for the specified period.

By following these steps, we efficiently obtain CHIRPS data that is both spatially and temporally aligned with our study area and period, ready for further processing and analysis.

2.1.2 Opening and Inspecting CHIRPS Data

Having successfully downloaded our CHIRPS rainfall data from ClimateSERV, clipped precisely to the Upper Citarum River Watershed and covering the years 2007 to 2019, we now turn to the crucial next step: opening and inspecting this NetCDF file to understand its structure and contents. In this section, we’ll use specialized libraries to load the data, explore its dimensions, variables, and attributes, and gain a firm grasp of what this satellite-derived rainfall dataset truly represents before we move forward with any further processing or analysis. Essentially, we’re ready to unpack our data and see what’s inside, ensuring we have a solid foundation for comparing it with our ground-based measurements later on.

2.1.2.1 Loading the NetCDF File

With our CHIRPS data downloaded and ready, the first step is to load it into our computing environment so we can begin exploring its contents. For this task, we’ll be leveraging the powerful xarray library in Python, which is specifically designed for working with labeled, multi-dimensional datasets like the NetCDF files commonly used in climate science. Let’s store the path to our downloaded NetCDF file, which is conveniently named chirps_2007_2019_citarum_watershed.nc, in a variable called CHIRPS_PATH for easy reference.

Now, using xarray (often imported as xr), we can load the dataset with the xr.open_dataset() function. This function neatly unpacks the NetCDF file and stores its contents—including the precipitation data, spatial coordinates, timestamps, and metadata—into an xarray Dataset object, which we’ll call chirps_ds. This object will be our primary interface for interacting with the CHIRPS data throughout this analysis.

Table 1: CHIRPS Dataset Structure
<xarray.Dataset> Size: 3MB
Dimensions:               (latitude: 10, longitude: 13, time: 4748)
Coordinates:
  * latitude              (latitude) float64 80B -7.225 -7.175 ... -6.825 -6.775
  * longitude             (longitude) float64 104B 107.4 107.4 ... 107.9 108.0
  * time                  (time) datetime64[ns] 38kB 2007-01-01 ... 2019-12-31
    spatial_ref           int64 8B 0
Data variables:
    precipitation_amount  (time, latitude, longitude) float32 2MB ...

We can now see that chirps_ds is an xarray.Dataset object in Table 1, which neatly organizes the data into dimensions, coordinates, data variables, and attributes. We can observe that our dataset has three dimensions: latitude, longitude, and time, representing the spatial and temporal components of our rainfall data. The latitude dimension has a size of 10, longitude has a size of 13, and time has a size of 4748. These dimensions correspond to the spatial grid of our data and the number of daily time steps, respectively.

In the next subsection, we will delve deeper into these dimensions and explore the coordinates associated with them. This will give us a more concrete understanding of the spatial resolution of our CHIRPS data and the exact time period it covers.

2.1.2.2 Examining Dataset Dimensions and Variables

Now that we’ve successfully loaded our CHIRPS data into an xarray Dataset, let’s take a closer look at its fundamental components: dimensions and variables.

Dimensions in xarray are like the axes of our data cube, defining the shape and size of our dataset. In the output provided earlier, we saw three dimensions:

  • latitude (latitude: 10): This dimension represents the north-south extent of our data, and it has a size of 10. This means our data covers 10 distinct latitude points.
  • longitude (longitude: 13): This dimension represents the east-west extent, with a size of 13, indicating 13 distinct longitude points.
  • time (time: 4748): This dimension represents the temporal extent, with a size of 4748. This corresponds to 4748 daily time steps, covering the period from January 1, 2007, to December 31, 2019.

Together, these dimensions tell us that our data is organized as a 10x13x4748 grid, representing a spatial grid of 10 latitudes by 13 longitudes, with each grid cell containing rainfall data for each of the 4748 days in our time period.

Variables, on the other hand, hold the actual data values and metadata associated with each dimension. In our chirps_ds dataset, we have one primary data variable:

  • precipitation_amount (time, latitude, longitude): This variable holds the daily precipitation amount data, measured in millimeters (mm). Its shape matches the dimensions of our dataset (4748, 10, 13), meaning it contains a precipitation value for each combination of time, latitude, and longitude.

Let’s examine these components programmatically. We can access the dimensions directly from our chirps_ds object:

FrozenMappingWarningOnValuesAccess({'latitude': 10, 'longitude': 13, 'time': 4748})

This tells us, once again, that our dataset has a latitude dimension of size 10, a longitude dimension of size 13, and a time dimension of size 4748. Now that we’ve confirmed the dimensions, let’s proceed to access the precipitation_amount variable and inspect its attributes:

Table 2: CHIRPS Precipitation Amount Variable
<xarray.DataArray 'precipitation_amount' (time: 4748, latitude: 10,
                                          longitude: 13)> Size: 2MB
[617240 values with dtype=float32]
Coordinates:
  * latitude     (latitude) float64 80B -7.225 -7.175 -7.125 ... -6.825 -6.775
  * longitude    (longitude) float64 104B 107.4 107.4 107.5 ... 107.9 108.0
  * time         (time) datetime64[ns] 38kB 2007-01-01 2007-01-02 ... 2019-12-31
    spatial_ref  int64 8B 0
Attributes:
    long_name:              precipitation_amount
    units:                  mm
    accumulation_interval:  1 day
    comment:                Climate Hazards group InfraRed Precipitation with...
    cell_methods:           time: mean

Here’s what we can glean from Table 2:

  • DataArray: We’re dealing with an xarray.DataArray named precipitation_amount. This is the fundamental data structure in xarray for holding multi-dimensional labeled data.
  • Dimensions: The array has three dimensions: time (4748), latitude (10), and longitude (13), confirming what we saw earlier.
  • Data Type: The data is stored as float32, which means each precipitation value is a 32-bit floating-point number.
  • Coordinates:
    • latitude: The latitude values range from -7.225 to -6.775 degrees North.
    • longitude: The longitude values range from 107.375 to 107.975 degrees East.
    • time: The time values span from 2007-01-01 to 2019-12-31, stored as datetime64[ns] objects.
  • Attributes:
    • long_name: “precipitation_amount” - a descriptive name for the variable.
    • units: “mm” - indicating that the values represent millimeters of rainfall.
    • accumulation_interval: “1 day” - confirming that these are daily precipitation values.
    • comment: “Climate Hazards group InfraRed Precipitation with Stations” - providing the source of the data.
    • cell_methods: “time: mean” - indicating that the values represent the mean precipitation over each day.

In essence, Table 2 tells us that the precipitation_amount variable holds daily precipitation data in millimeters, arranged in a time-latitude-longitude grid, and provides the necessary metadata to interpret these values accurately. This detailed understanding of our core variable is crucial as we proceed to further exploration and analysis.

2.1.3 Initial Data Visualization

In this section, we’ll start visualizing our CHIRPS precipitation data to get a better sense of the spatial patterns of rainfall within the Upper Citarum River Watershed. We’ll begin with some simple plots for specific days and then move on to a more interactive map-based visualization.

2.1.3.1 Visualizing Precipitation for Specific Days

One of the quickest ways to get a visual overview of our data is to use the built-in plotting capabilities of xarray. The .plot() method, when applied to a DataArray, automatically generates a map if the data has spatial dimensions (latitude and longitude), which is the case for our precipitation_amount variable.

For our initial visualization, let’s focus on the first three days of our dataset: January 1st, 2007 to January 3rd, 2007. We can select these specific days using xarray’s .sel() method, which allows us to slice the data based on coordinate values.

Here’s the plots for these three days:

(a) January 1st, 2007
(b) January 2nd, 2007
(c) January 3rd, 2007
Figure 3: Precipitation from January 1st, 2007 to January 3rd, 2007

Here are some observations we can make based on Figure 3:

  • Spatial Variation: We can clearly see that rainfall is not uniform across the Upper Citarum River Watershed. Different areas receive different amounts of precipitation on each day.
  • Day-to-Day Changes: The spatial patterns of rainfall change from day to day. For example, on January 1st Figure 3 (a), the central part seems to receive more rainfall compared to other days. Meanwhile, on January 2nd Figure 3 (b), the northern part of the watershed appears to have received more rainfall.
  • Rainfall Amounts: The color scale indicates the amount of rainfall in millimeters (mm). We can see that some areas receive up to 8 mm of rainfall on these days, while others receive very little or none.
2.1.3.2 Creating an Interactive Choropleth Map

For this task, we’ll use the plotly library, which provides a high-level interface for creating interactive plots, including choropleth maps. We can leverage interactive maps to explore spatial rainfall patterns with greater flexibility. A choropleth map is an effective way to visualize our precipitation data, using color gradients within defined areas – in this case, grid cells – to represent rainfall amounts.

We’ve created an interactive choropleth map for January 1st, 2007 using the plotly.graph_objects library. This map (Figure 4) displays CHIRPS precipitation across the Upper Citarum River Watershed, with each grid cell colored according to its rainfall value. The watershed boundary is also overlaid, providing context.

Figure 4: Interactive Map of Precipitation on January 1st, 2007

Interactive Features:

  • Hover: Displays the location (latitude, longitude) and precipitation value of a cell on mouse hover.
  • Zoom: Allows closer examination of specific areas.
  • Pan: Enables map movement to focus on regions of interest.

This interactive map offers a powerful way to explore spatial rainfall patterns, and could be further enhanced by adding a time-selection slider or linking to time-series plots for specific locations in Dash application.


2.2 BBWS Citarum Data

Now that we’ve thoroughly explored the CHIRPS dataset, it’s time to turn our attention to our second source of rainfall data: the ground-based measurements collected by Balai Besar Wilayah Sungai (BBWS) Citarum. While the CHIRPS data provided us with a satellite-derived, gridded view of rainfall, the BBWS Citarum dataset offers a different perspective – direct measurements taken from rain gauges located right here on the ground within the Upper Citarum River Watershed.

2.2.1 Data Overview and Station Information

The BBWS Citarum, the authority responsible for managing water resources in the Citarum River Basin, operates a network of automatic rain gauge stations. For this analysis, we’ll be focusing on data from 16 of these stations, all situated within the Upper Citarum River Watershed. These stations provide valuable, localized rainfall measurements, which we’ll treat as our “ground truth” for comparison against the CHIRPS data.

It’s important to note that the exact process by which we obtained this BBWS Citarum data cannot be disclosed here. However, we can assure you that the data we’re working with covers the same period as our CHIRPS analysis – 2007 to 2019 – and has already undergone a thorough cleaning process, with quality control, and is ready for analysis. The data is conveniently stored in CSV format, making it readily accessible for processing with standard data analysis tools. Table Table 3 provides an overview of the 16 rain gauge stations within the Upper Citarum River Watershed, including their names and coordinates.

Table 3: Rainfall Stations in Upper Citarum Watershed
id title station_name latitude longitude
0 hk_P0406 Upper Citarum Watershed Kayu Ambon -6.820833 107.633056
1 hk_P0424 Upper Citarum Watershed Ciherang -7.036944 107.580278
2 hk_164 Upper Citarum Watershed Jatiroke -6.929722 107.788056
3 hk_174P183 Upper Citarum Watershed Cibereum -7.191944 107.676667
4 hk_P0407 Upper Citarum Watershed Cipanas -7.191111 107.608333
5 hk_P0408 Upper Citarum Watershed Cisondari -7.120278 107.489167
6 hk_P0409 Upper Citarum Watershed Cileunca -7.193056 107.544722
7 hk_P04101560a Upper Citarum Watershed Lembang Meteo -6.831944 107.627222
8 hk_P0415 Upper Citarum Watershed Dago Pakar -6.880833 107.614444
9 hk_P0416 Upper Citarum Watershed Rancaekek -6.958611 107.753889
10 hk_P0419 Upper Citarum Watershed Margahayu I -6.800278 107.656667
11 hk_P0420 Upper Citarum Watershed Cipaku -7.021389 107.702778
12 hk_P0421 Upper Citarum Watershed Tanjungsari -6.902778 107.796389
13 hk_P0422 Upper Citarum Watershed Cibiru -6.916111 107.716389
14 hk_P0423 Upper Citarum Watershed Cikancung -7.036944 107.825556
15 hk_196a Upper Citarum Watershed Cicalengka -6.968889 107.847222

To further enhance our understanding of the spatial distribution of these rain gauge stations, let’s visualize their locations on an interactive map. Figure 5 below displays the 16 BBWS Citarum stations within the Upper Citarum River Watershed, overlaid on a map with the watershed boundary clearly marked.

Figure 5: Interactive Map of Rainfall Stations in Upper Citarum Watershed

From a visual inspection of Figure 5, we can make several key observations about the distribution of the BBWS Citarum rain gauge stations within the Upper Citarum River Watershed:

  • Coverage: The rain gauge stations are distributed across the entirety of the Upper Citarum River Watershed, providing good spatial coverage for capturing rainfall variability within the region. The stations appear to cover both the northern and southern parts, as well as the eastern and western extents of the watershed.
  • Clustering: There’s a noticeable cluster of stations in the northern part of the watershed. This denser network in the north suggests a focus on monitoring rainfall in that particular area. While other stations are more spread out. We also see a noticeable absence of stations in the far west, where a large body of water is visible.
  • Boundary: All stations are confirmed to be located within the watershed boundary, as indicated by the grey line.
  • Relative Spacing: The stations in the southern and eastern parts of the watershed appear to be more evenly spaced, while those in the north, as mentioned, are more clustered.
  • East-West Distribution: There are more stations in the eastern half of the watershed.

These initial observations suggest that the BBWS Citarum rain gauge network is strategically designed to capture the spatial variability of rainfall within the Upper Citarum River Watershed, with a higher density of stations in the north. This spatial distribution will be important to consider when we compare the BBWS Citarum data to the gridded CHIRPS data. To get a sense of how the station-based measurements relate to the broader spatial patterns captured by CHIRPS, here’s a map combining the station locations with CHIRPS precipitation data for January 1st, 2007 (Figure 6).

Figure 6: Interactive Map of Rainfall Stations and Precipitation on January 1st, 2007

2.2.2 Loading and Inspecting BBWS Citarum Data

Having visualized the spatial distribution of our rain gauge network, we now turn to the core of our data: the rainfall measurements themselves. In this section, we’ll outline the process of loading the BBWS Citarum rainfall data from its CSV format, examining its structure, and performing some initial checks to ensure its integrity.

As mentioned earlier, the BBWS Citarum rainfall data is stored in CSV (Comma Separated Values) files, a common and widely compatible format for tabular data. To work with this data in Python, we’ll use the pandas library, a powerful tool for data manipulation and analysis. The pandas function read_csv() is specifically designed to read data from CSV files into a DataFrame, a tabular data structure that’s ideal for our needs.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4748 entries, 2007-01-01 to 2019-12-31
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   hk_P0406       4748 non-null   float64
 1   hk_P0424       4748 non-null   float64
 2   hk_164         4748 non-null   float64
 3   hk_174P183     4748 non-null   float64
 4   hk_P0407       4748 non-null   float64
 5   hk_P0408       4748 non-null   float64
 6   hk_P0409       4748 non-null   float64
 7   hk_P04101560a  4748 non-null   float64
 8   hk_P0415       4748 non-null   float64
 9   hk_P0416       4748 non-null   float64
 10  hk_P0419       4748 non-null   float64
 11  hk_P0420       4748 non-null   float64
 12  hk_P0421       4748 non-null   float64
 13  hk_P0422       4748 non-null   float64
 14  hk_P0423       4748 non-null   float64
 15  hk_196a        4748 non-null   float64
dtypes: float64(16)
memory usage: 630.6 KB

Here’s what we can glean from this output:

  • DatetimeIndex: The index of the DataFrame is a DatetimeIndex, meaning the rows are indexed by date. It contains 4748 entries, ranging from 2007-01-01 to 2019-12-31, confirming that our data covers the expected time period.
  • Columns: There are 16 columns, each representing a different rain gauge station. The column names correspond to the station IDs we saw in Table 3 (e.g., hk_P0406, hk_P0424, etc.).
  • Data Type: All 16 station columns have a data type of float64, indicating that they contain numerical floating-point values, which is appropriate for rainfall measurements.
  • Non-Null Count: Each column has 4748 non-null values, which matches the number of entries in the DatetimeIndex. This suggests that there is no missing data in the dataset, which is excellent for our analysis.

2.2.3 Initial Data Visualization

Now that we’ve loaded and inspected our BBWS Citarum rainfall data, let’s begin visualizing it to gain a more intuitive understanding of the temporal patterns at each station. For this initial exploration, we’ll use heatmaps generated with the seaborn library, which is excellent for creating informative and visually appealing statistical graphics.

A heatmap is an effective way to represent three dimensions of data: time on two axes (in this case, year and day of year) and the rainfall amount represented by color intensity. This allows us to quickly identify periods of high and low rainfall throughout the 13-year record at each station.

Figure 7 displays the daily rainfall amounts from 2007 to 2019 for all 16 BBWS Citarum stations. Each heatmap uses the same format and color scale, allowing for direct comparisons between stations.

Cibereum

Cibiru

Cicalengka

Ciherang

Cikancung

Cileunca

Cipaku

Cipanas

Cisondari

Dago Pakar

Jatiroke

Kayu Ambon

Lembang Meteo

Margahayu I

Rancaekek

Tanjungsari
Figure 7: Heatmap of Rainfall Data from BBWS Citarum

In these heatmaps:

  • X-axis: Represents the day of the year, divided into months.
  • Y-axis: Represents the year, spanning from 2007 to 2019.
  • Color Intensity: Indicates the amount of rainfall on a given day, with darker shades of red corresponding to higher rainfall amounts (in millimeters). The color scale is displayed on the right of each heatmap, and it’s consistent across all stations, ranging from 0 to 200 mm.

Key Observations from the Heatmaps:

  • General Wet and Dry Periods: Across most stations, we can observe a general pattern of higher rainfall amounts occurring during the beginning and end of the year, typically corresponding to the wet season in the region. The middle of the year tends to be drier.
  • Interannual Variability: There’s noticeable variation in rainfall patterns between different years.
  • Station-Specific Patterns: While the general wet/dry pattern holds for many stations, there are some differences in the specific timing and intensity of rainfall events. For instance, some stations show more frequent high-intensity events, while others have more evenly distributed rainfall.
  • Intensity Differences: The maximum rainfall intensity varies between stations. Some stations, like Cikancung (hk_P0423), reach over 175 mm in a single day, while others, like Kayu Ambon (hk_P0406) have lower maximum values.

These heatmaps provide a valuable overview of the rainfall patterns captured by each BBWS Citarum station. They highlight both commonalities and differences in the temporal distribution of rainfall across the Upper Citarum River Watershed. This visual exploration will serve as a foundation for our subsequent quantitative analysis, where we’ll compare these patterns with those derived from the CHIRPS dataset.


2.3 Data Alignment and Synchronization

With our CHIRPS and BBWS Citarum data loaded and inspected, we now come to a crucial step: aligning these datasets in both time and space. This is essential to ensure a fair and meaningful comparison. We need to be certain that when we’re comparing a particular date in the CHIRPS data, it aligns with the same date in the BBWS Citarum data. Likewise, we need to consider how to best compare the gridded CHIRPS data with the point-based measurements from the BBWS stations. In this section, we’ll outline our approach to achieve this alignment and synchronization.

2.3.1 Temporal Alignment

A fundamental step in our data preparation is ensuring that both the CHIRPS and BBWS Citarum datasets cover the same time period. We need to verify that when we analyze data for a specific date, say January 1st, 2010, both datasets provide measurements for that exact day. Fortunately, both datasets were downloaded and prepared to cover the same time period. As we saw in the previous sections, both our CHIRPS dataset and our BBWS Citarum dataset have been confirmed to span from January 1, 2007, to December 31, 2019.

This alignment can be programmatically confirmed by checking the time coordinates of CHIRPS dataset and the index of BBWS Citarum dataset:

CHIRPS time range: 2007-01-01T00:00:00.000000000 to 2019-12-31T00:00:00.000000000
BBWS Citarum time range: 2007-01-01 00:00:00 to 2019-12-31 00:00:00

From the output, we can see that the time range is matching for both datasets. It’s important to note that the CHIRPS data is recorded in Coordinated Universal Time (UTC+0), while the BBWS Citarum data is recorded in the local time zone of Jakarta, Indonesia (UTC+7), where the study area is located.

Despite this 7-hour difference, because we are working with daily rainfall totals, this time zone difference does not affect our analysis. Daily totals effectively combine all rainfall within a 24 hour period, so the time zone offset will not shift the daily rainfall totals. In other words, whether rainfall is measured at 8PM in UTC or 3AM the next day in UTC+7, the rainfall is still captured within the same daily aggregate. In conclusion, we have established that both datasets cover the identical temporal period on a daily level. We can now proceed to spatial alignment which we will discuss in subsequent section.

2.3.2 Spatial Matching: BBWS Citarum Data

While we’ve ensured temporal alignment, we now face the challenge of spatial alignment. The BBWS Citarum data consists of point measurements taken at specific rain gauge locations. On the other hand, CHIRPS provides rainfall data on a spatial grid. To make meaningful comparisons, we need to consider how to represent the point-based BBWS Citarum data in a way that aligns with the spatial nature of the CHIRPS data. In this section, we’ll discuss three approaches we will use to represent the BBWS Citarum station data.

2.3.2.1 Original Station Data

The most straightforward approach is to use the raw, daily rainfall measurements from each of the 16 BBWS Citarum stations directly, as seen in the output from our inspection in section 2.2.2 Loading and Inspecting BBWS Citarum Data. This method involves no spatial averaging or interpolation; we treat each station’s measurement as an independent observation at its specific location. For instance, the rainfall recorded at station hk_P0406 is used to represent rainfall at that specific point, without considering any data from neighboring stations.

This approach is ideal for a direct point-to-point comparison with CHIRPS data extracted at the corresponding station locations. While it does not capture spatial variability across the watershed, it will serve as our baseline for assessing CHIRPS data at the specific points where we have ground truth measurements.

Table 4: Rainfall Data from BBWS Citarum (5 first rows)
hk_P0406 hk_P0424 hk_164 hk_174P183 hk_P0407 hk_P0408 hk_P0409 hk_P04101560a hk_P0415 hk_P0416 hk_P0419 hk_P0420 hk_P0421 hk_P0422 hk_P0423 hk_196a
date
2007-01-01 0.0 17.5 0.1 16.4 2.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 2.0 1.0 1.0
2007-01-02 1.2 0.0 0.6 5.4 0.0 13.2 0.0 2.2 0.0 0.0 0.0 0.0 0.0 1.0 0.0 4.0
2007-01-03 0.0 0.0 0.0 0.0 0.0 11.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2007-01-04 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2007-01-05 0.0 0.0 0.3 4.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

The table above (Table 4) shows the first five days of our BBWS Citarum dataset, demonstrating the raw rainfall measurements (in millimeters) at each of the 16 stations. Each column represents a specific station, and each row represents a date. As you can see, on certain days, some stations recorded rainfall while others didn’t, highlighting the spatial variability of rainfall.

2.3.2.2 Watershed-Wide Average

While using individual station data provides a detailed view, it can be useful to summarize the overall rainfall across the entire Upper Citarum River Watershed. This is particularly relevant for comparing with area-averaged CHIRPS data. For this, we can calculate a simple arithmetic mean of the rainfall measured at all 16 BBWS Citarum stations for each day. This approach treats all stations as equally representative of the watershed’s rainfall and gives us a single value for each day, representing a watershed-wide average.

This watershed-wide average is computed by summing the rainfall values from all 16 stations and dividing by 16 for each day. This method is straightforward and easy to implement. However, it’s important to remember that it assumes a uniform spatial distribution of rainfall, and it does not account for the actual spatial distribution of stations or any potential spatial bias in their coverage.

bbws_data_avg = bbws_data_original.mean(axis=1).to_frame(name="rainfall_avg")

To compute this, we use the following code, which also converts the result into a Pandas DataFrame for easier handling later. This watershed-wide average is stored in the variable bbws_watershed_avg for subsequent use.

To visualize the temporal patterns of this watershed-wide average daily rainfall, we employ a calendar heatmap. This visualization effectively displays daily rainfall amounts, aggregated across all stations, using a color scale to represent the magnitude of rainfall for each day, arranged in a familiar calendar format. Figure 8 displays the daily average rainfall for each year from 2007 to 2019.

2007-2011

2012-2016

2017-2019
Figure 8: Calmap of Average Rainfall Data from BBWS Citarum

Several key observations can be made from the calendar heatmap:

  • Seasonal Patterns: A clear seasonal pattern is evident, with the rainy season typically occurring around the beginning and end of each year (October to April), while drier periods are observed in the middle of the year (around June to September). This aligns with the known wet and dry seasons in the region.
  • Inter-annual Variability: While the general seasonal pattern holds, there is noticeable variation in the intensity and timing of rainfall across different years. For instance, some years exhibit more intense rainfall events (darker squares) than others, and the onset and duration of the rainy season can vary.
  • Extreme Events: A few days stand out with particularly dark blue squares, indicating heavy rainfall events. These events are more common during the rainy season but can also occur sporadically during other times of the year.
  • Driest Year: The year 2015 and 2019 appears to be the driest year overall, with fewer dark blue squares and more light-colored squares, suggesting lower rainfall amounts throughout the year.
  • Wettest Year: Conversely, 2010 seems to be a relatively wet year, with a higher frequency of darker blue squares, particularly during the typical rainy season months.

This calendar heatmap visualization provides a concise yet informative overview of the daily average rainfall patterns in the Upper Citarum River Watershed over the 13-year period. It effectively highlights seasonal trends, inter-annual variability, and extreme events. It’s important to remember that this analysis is based on a simple watershed-wide average, which, by its nature, does not capture the spatial variability of rainfall within the watershed. It treats the entire area as if it received the same amount of rainfall each day, potentially masking important local variations. Despite this simplification, the watershed-wide average provides a valuable baseline for comparison with area-averaged CHIRPS data, which we will explore in the following sections.

2.3.2.3 Thiessen Polygon Weighted Average

In the previous sections, we explored two ways to represent the BBWS Citarum rainfall data: using the original station measurements and calculating a simple watershed-wide average. Now, we introduce a third, more spatially refined approach using a technique called Thiessen polygons, also known as Voronoi diagrams. While this might sound a bit technical, the underlying idea is quite intuitive and we’ll explain it in a way that’s easy to grasp.

The goal here is to create a weighted average of rainfall across the Upper Citarum River Watershed, but unlike the simple average, we want to give more weight to stations that represent larger areas. Thiessen polygons help us determine those “areas of influence” for each station.

What are Thiessen Polygons?

Imagine the Upper Citarum River Watershed as a map, and the 16 BBWS Citarum rain gauge stations as points scattered across it. Now, imagine drawing boundaries around each station in such a way that every location within a boundary is closer to that station than to any other station. These boundaries create a set of polygons, each enclosing an area that is “represented” by its corresponding station. These are Thiessen polygons.

Think of it like this: if each station were a city, the Thiessen polygons would be like the territories around each city. Every point within a territory is closer to its own city than to any neighboring city. Figure 9 shows a simplified visualization of this concept for our watershed.

Figure 9: Interactive Map of Rainfall Stations and Thiessen Polygons within Upper Citarum Watershed

Each polygon in Figure 9 represents the “area of influence” of its rain gauge station. In other words, we assume that the rainfall measured at a particular station is representative of the rainfall within its entire polygon. This method acknowledges that stations are not uniformly distributed and that some stations cover larger or more hydrologically significant areas than others.

Calculating the Weighted Average

We have already calculated the Thiessen polygon weights for each station, which are presented inside Figure 9. Each weight represents the proportion of the watershed area covered by that station’s polygon. Now, we can use these weights to calculate a daily weighted average rainfall for the entire Upper Citarum River Watershed.

To do this, for each day in our dataset, we multiply each station’s recorded rainfall by its corresponding weight. Then, we sum these weighted rainfall values to obtain the Thiessen polygon weighted average rainfall for the entire watershed for that day. This can be expressed mathematically as:

\[ \text{Weighted Average Rainfall}_d = \sum_{i=1}^{16} (\text{Rainfall}_{i,d} * \text{Weight}_i) \]

where:

  • \(d\) is the day index.
  • \(\text{Weighted Average Rainfall}_d\) is the Thiessen polygon weighted average rainfall for day \(d\).
  • \(i\) is the station index (ranging from 1 to 16).
  • \(\text{Rainfall}_{i,d}\) is the rainfall recorded at station \(i\) on day \(d\).
  • \(\text{Weight}_i\) is the Thiessen weight for station \(i\) (see Figure 9).

We perform this calculation for every day in our dataset, resulting in a time series of daily Thiessen polygon weighted average rainfall values. To perform this calculation, we will utilize the following code:

Code
weighted_stations = bbws_stations_area_gdf.set_index('id').reindex(bbws_data_original.columns)['weighted_area']

bbws_data_thiessen_avg = bbws_data_original.mul(weighted_stations, axis=1).sum(axis=1).to_frame(name="weighted_avg_tp_rainfall")

The weighted average rainfall data is stored in the bbws_thiessen_avg variable.

Visualization

To visualize the temporal patterns of this weighted average, we can use a calendar heatmap, just like we did for the simple watershed-wide average. Figure 10 displays the Thiessen polygon weighted average daily rainfall for each year from 2007 to 2019.

2007-2011

2012-2016

2017-2019
Figure 10: Calmap of Thiessen Polygon Weighted Average Rainfall Data from BBWS Citarum

Comparing Figure 10 (Thiessen weighted average) with Figure 8 (simple average), we can observe the following key differences:

  • Overall Similarity: The general patterns of wet and dry periods are very similar, as shown by the similar distribution of darker and lighter shades across the years. Both methods clearly capture the major seasonal trends.
  • Slightly Enhanced Extremes: In the Thiessen weighted average (Figure 10), extremely wet or dry periods appear slightly more pronounced. For example, some of the wettest days in 2010 and the driest days in 2015 and 2019 stand out a bit more with darker or lighter shades, respectively, compared to the simple average (Figure 8).

In essence, while both methods provide a good overview of rainfall patterns, the Thiessen weighted average shows a subtle enhancement of rainfall extremes, likely due to the influence of stations with larger weights. This suggests that the Thiessen method might be more sensitive to localized heavy rainfall or drought events. A more detailed, quantitative comparison of these two methods, along with an analysis of their differences, will be presented in the next chapter (Chapter 3: Exploratory Data Analysis). For now, these visualizations serve as a quick, initial check to ensure our data processing steps are producing reasonable results.

2.3.3 Spatial Matching: CHIRPS Data

Now that we have explored different ways to represent the BBWS Citarum station data spatially, we need to do the same for our gridded CHIRPS data. Unlike the point-based BBWS Citarum measurements, CHIRPS data comes as a grid of rainfall estimates. Therefore, we need to figure out how to extract and/or aggregate this gridded data to compare it with our BBWS Citarum data. In this section, we’ll discuss several methods for extracting and aggregating CHIRPS data, ranging from simple averages to more refined techniques. These methods will allow us to compare the CHIRPS estimates with the BBWS Citarum measurements, whether on a point-by-point basis or by considering broader spatial patterns.

2.3.3.1 Watershed-Wide Average

The first method we’ll explore is calculating a watershed-wide average rainfall from the CHIRPS dataset. This is analogous to the watershed-wide average we calculated for the BBWS Citarum data, and it will allow us to compare the overall rainfall estimates from both sources for the Upper Citarum River Watershed.

To calculate the watershed-wide average CHIRPS rainfall, we’ll follow these steps:

  1. Identify Grid Cells within the Watershed: We need to determine which CHIRPS grid cells fall within the boundaries of the Upper Citarum River Watershed. We can achieve this by using the watershed boundary shapefile that we used to clip the CHIRPS data in ClimateSERV in section 2.1.1 Using ClimateSERV to Obtain CHIRPS Data.

  2. Calculate Daily Average: Once we know which grid cells are inside the watershed, we can calculate the simple arithmetic mean of the rainfall values of those cells for each day.

    Mathematically, this can be represented as:

    \[ \text{Watershed Average Rainfall}_{CHIRPS,d} = \frac{1}{N} \sum_{i=1}^{N} R_{i,d} \]

    Where:

    • \(\text{Watershed Average Rainfall}_{CHIRPS,d}\) is the average CHIRPS rainfall for the entire watershed on day \(d\).
    • \(N\) is the number of CHIRPS grid cells within the watershed.
    • \(R_{i,d}\) is the rainfall value from CHIRPS grid cell \(i\) on day \(d\).

This calculation will result in a single daily rainfall value representing the average rainfall over the entire Upper Citarum River Watershed, according to the CHIRPS dataset.

To perform this calculation, we can use xarray’s where function to select the grid cells within the watershed and then take the mean across the spatial dimensions (latitude, longitude) for each day. The code to achieve this is given below, which store the final result in a pandas DataFrame called chirps_watershed_avg.

# This part is not needed since we already downloaded CHIRPS data based on the watershed boundary
chirps_clipped = chirps_ds.rio.clip(
    citarum_hulu_gdf.geometry,
    drop=True,  # Remove cells outside the geometry
    all_touched=False  # Include cells that are partially within the geometry, False = default from ClimateServ
)

chirps_data_avg = chirps_clipped["precipitation_amount"].mean(dim=["latitude", "longitude"]).to_dataframe().drop(columns="spatial_ref")

To visualize this watershed-wide average CHIRPS rainfall, we’ll create a calendar heatmap similar to the one we made for the BBWS Citarum average. Figure 11 displays the daily average CHIRPS rainfall for each year from 2007 to 2019.

2007-2011

2012-2016

2017-2019
Figure 11: Calmap of CHIRPS Watershed-Wide Average Rainfall Data

By observing Figure 11, the most striking pattern that emerges from this visualization is a distinct seasonal cycle. Higher rainfall amounts, indicated by darker squares, are predominantly observed during the beginning and end of each year, corresponding to the typical wet season in the region. Conversely, the middle months of the year tend to exhibit lower rainfall amounts, shown as lighter squares, aligning with the expected dry season. This cyclical pattern is consistent across all years, although some variations in the intensity and timing of rainfall are apparent.

It’s important to remember that this visualization represents a simple average of all CHIRPS grid cells within the watershed, which might smooth out some of the local variations in rainfall. While this watershed-wide average provides a useful overview, it’s also important to examine how CHIRPS data compares to individual BBWS Citarum stations. In the next section, we will explore a method for extracting CHIRPS data at the nearest grid cell to each BBWS Citarum station, allowing for a more direct, point-to-point comparison.

2.3.3.2 Nearest Grid Cell

In the previous section, we calculated a watershed-wide average of CHIRPS rainfall, providing a single value for each day representing the entire Upper Citarum River Watershed. While useful for an overall comparison, this approach doesn’t allow us to assess how well CHIRPS represents rainfall at specific locations where BBWS Citarum stations are situated.

In this section, we introduce a more direct approach: extracting the rainfall value from the CHIRPS grid cell that is closest to each BBWS Citarum station. This method allows us to perform a point-to-point comparison between CHIRPS data and the ground-based measurements, giving us a better sense of how well CHIRPS captures rainfall at specific locations.

Concept

The core idea is simple: for each of the 16 BBWS Citarum rain gauge stations, we find the CHIRPS grid cell whose center is geographically closest to that station. We then extract the rainfall time series from that nearest grid cell and use it as the CHIRPS estimate for that station’s location.

Methodology

To implement this, we’ll follow these steps:

  1. Determine Nearest Grid Cell: For each BBWS Citarum station, we calculate the distance to the center of every CHIRPS grid cell within our study area. We then identify the grid cell with the minimum distance.

  2. Extract Time Series: Once we’ve identified the nearest grid cell for a station, we extract the entire daily rainfall time series (2007-2019) from that cell.

  3. Repeat for All Stations: We repeat steps 1 and 2 for all 16 BBWS Citarum stations.

Mathematical Representation

For each BBWS Citarum station s, we can represent this process mathematically as:

\[ \text{Nearest Grid Cell}_s = \arg\min_{i} \text{Distance}(\text{Station}_s, \text{Grid Cell}_i) \]

\[ \text{CHIRPS Rainfall}_{s,d} = \text{Rainfall}(\text{Nearest Grid Cell}_s, d) \]

Where:

  • \(\text{Station}_s\) represents the coordinates of BBWS Citarum station s.
  • \(\text{Grid Cell}_i\) represents the coordinates of the center of CHIRPS grid cell i.
  • \(\text{Distance}()\) is a function calculating the geographical distance (e.g., great-circle distance) between two points.
  • \(\arg\min\) returns the index i of the grid cell that minimizes the distance.
  • \(\text{Nearest Grid Cell}_s\) is the index of the CHIRPS grid cell closest to station s.
  • \(\text{CHIRPS Rainfall}_{s,d}\) is the CHIRPS rainfall value at the nearest grid cell to station s on day d.
  • \(\text{Rainfall}()\) is a function that returns the rainfall value for a given grid cell index and day.

Implementation

We’ve employed a combination of xarray and scipy.spatial.KDTree to efficiently pinpoint the nearest CHIRPS grid cells to each BBWS Citarum station and extract the corresponding rainfall time series. This process involves complex calculations, but the result is visually represented in Figure 12, which displays the spatial relationship between the BBWS Citarum rain gauge stations (shown as larger blue circles) and their nearest CHIRPS grid cell centers (shown as smaller green circles).

Figure 12: Interactive Map of Nearest CHIRPS to BBWS Stations

To give us a clearer idea of the data we’ve extracted, Table 5 presents the first five days of rainfall data from the nearest CHIRPS grid cells corresponding to each BBWS Citarum station.

Table 5: Rainfall Data from Nearest CHIRPS Grid Cell to BBWS Citarum Stations
hk_P0406 hk_P0424 hk_164 hk_174P183 hk_P0407 hk_P0408 hk_P0409 hk_P04101560a hk_P0415 hk_P0416 hk_P0419 hk_P0420 hk_P0421 hk_P0422 hk_P0423 hk_196a
date
2007-01-01 3.265238 3.739876 3.480353 0.000000 0.000000 0.000000 0.000000 3.265238 2.211674 4.992247 3.612763 6.574951 3.480353 3.284562 5.153909 5.340987
2007-01-02 6.530475 0.000000 6.960705 0.000000 0.000000 0.000000 3.679765 6.530475 4.423348 2.496123 7.225526 0.000000 6.960705 6.569124 2.576955 2.670493
2007-01-03 3.265238 3.739876 6.960705 4.180919 4.973224 7.547643 3.679765 3.265238 2.211674 7.488370 3.612763 3.287476 6.960705 3.284562 2.576955 8.011481
2007-01-04 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2007-01-05 0.000000 3.739876 3.480353 8.361837 4.973224 7.547643 3.679765 0.000000 2.211674 2.496123 0.000000 3.287476 3.480353 3.284562 5.153909 2.670493

From Table 5, we can observe that each station ID now has a corresponding CHIRPS rainfall value for each day. Comparing this with Table 4, which shows the original BBWS Citarum station data, we can begin to get a sense of how well the CHIRPS data matches the ground-based measurements at these specific locations.

2.3.3.3 Nearest Neighbor Average (Simple & Inverse Distance-Weighted)

In the previous section, we extracted CHIRPS rainfall data by selecting the single nearest grid cell to each BBWS Citarum station. While straightforward, that approach might not fully capture the spatial variability of rainfall, as it relies on a single grid cell’s value. In this section, we’ll explore a more refined method that considers the values of multiple nearby grid cells, providing a more representative estimate of rainfall at each station’s location.

We’ll discuss two variations of this method:

  1. Simple Nearest Neighbor Average: This method calculates the average rainfall from a specified number (n) of the nearest CHIRPS grid cells to each station, treating all neighboring cells equally.
  2. Inverse Distance-Weighted (IDW) Nearest Neighbor Average: This method also averages rainfall from the n nearest grid cells, but it gives more weight to closer cells and less weight to farther ones, under the assumption that closer cells are more likely to be representative of the rainfall at the station’s location.

Concept

The core idea behind both methods is to use the rainfall values from multiple CHIRPS grid cells surrounding a station to estimate the rainfall at that station’s location. Instead of relying on a single grid cell, we consider a neighborhood of cells.

  • Simple Average: Imagine drawing a circle around a station and selecting the n closest CHIRPS grid cell centers that fall within that circle. In the simple average method, we calculate the arithmetic mean of the rainfall values from these n cells, treating each cell’s contribution equally.
  • Inverse Distance Weighting: The IDW method also considers the n nearest grid cells, but it assigns weights to each cell based on its distance to the station. Closer cells receive higher weights, while farther cells receive lower weights. The weighted average is then calculated, giving more influence to the rainfall values of closer cells.

Methodology

Here’s how we implement these methods:

  1. Identify Nearest Neighbors: For each BBWS Citarum station, we find the n closest CHIRPS grid cell centers. This is similar to what we did in the previous section (2.3.3.2 Nearest Grid Cell), but instead of selecting only the single nearest cell, we select the n nearest cells. We will explore n = 3, 4, and 5.

  2. Calculate Average (Simple): For the simple average, we calculate the arithmetic mean of the rainfall values from the n nearest cells for each day.

  3. Calculate Weights (IDW): For the IDW method, we first calculate weights for each of the n nearest cells based on their distance to the station. The weight for each cell is inversely proportional to its distance, often raised to a power (e.g., squared or cubed). A common formula for calculating weights is:

    \[ w_i = \frac{1}{d_i^p} \]

    where:

    • \(w_i\) is the weight of the i-th nearest grid cell.
    • \(d_i\) is the distance between the station and the i-th nearest grid cell.
    • \(p\) is a power parameter that controls the degree of weighting (commonly 2 or 3).

    These weights are then normalized so that they sum up to 1:

    \[ W_i = \frac{w_i}{\sum_{j=1}^{n} w_j} \]

    where \(W_i\) is the normalized weight of the i-th cell.

  4. Calculate Weighted Average (IDW): We multiply each of the n nearest cells’ rainfall values by their corresponding normalized weights and sum these products to obtain the IDW average for each day.

Mathematical Representation

Simple Average:

For each BBWS Citarum station s, the simple average rainfall on day d can be represented as:

\[ \text{Simple Average Rainfall}_{s,d} = \frac{1}{n} \sum_{i=1}^{n} R_{i,d} \]

where:

  • \(n\) is the number of nearest neighbors considered.
  • \(R_{i,d}\) is the rainfall value from the i-th nearest CHIRPS grid cell on day d.

Inverse Distance-Weighted Average:

For each BBWS Citarum station s, the IDW average rainfall on day d can be represented as:

\[ \text{IDW Average Rainfall}_{s,d} = \sum_{i=1}^{n} W_i * R_{i,d} \]

where:

  • \(n\) is the number of nearest neighbors considered.
  • \(W_i\) is the normalized weight of the i-th nearest CHIRPS grid cell.
  • \(R_{i,d}\) is the rainfall value from the i-th nearest CHIRPS grid cell on day d.

Implementation

We’ve implemented these methods using a combination of xarray, scipy.spatial.KDTree. The code efficiently identifies the n nearest CHIRPS grid cells to each station, calculates the simple or IDW average rainfall, and stores the results in a pandas DataFrame. We explore n = 3 nearest neighbors and use a power parameter p = 2 for the IDW calculations.

Figure 13: Interactive Map of Nearest Neighbours CHIRPS to BBWS Stations (n=3)

From Figure 13, we can observe that for each station, the three nearest CHIRPS grid cells are generally located in close proximity. However, the distances and spatial arrangements vary, reflecting the non-uniform distribution of the BBWS Citarum stations and the fixed grid structure of CHIRPS.

To get a better sense of the data produced by these methods, let’s examine the first five days of rainfall estimates obtained using both the simple average and IDW average with n=3. Table 6 and Table 7 present these results. From Table 6, we can observe the daily rainfall values at each BBWS Citarum station, calculated as the simple average of the 3 nearest CHIRPS grid cells. Similarly, Table 7 shows the daily rainfall values at each station, but calculated using the inverse distance-weighted average of the 3 nearest CHIRPS grid cells.

Table 6: Simple Average of Nearest Neighbours CHIRPS (n=3)
hk_P0406 hk_P0424 hk_164 hk_174P183 hk_P0407 hk_P0408 hk_P0409 hk_P04101560a hk_P0415 hk_P0416 hk_P0419 hk_P0420 hk_P0421 hk_P0422 hk_P0423 hk_196a
date
2007-01-01 3.604767 3.685891 4.895231 0.000000 0.000000 1.545721 0.000000 3.029891 2.428898 5.558645 3.604767 6.142766 4.265182 3.086088 5.130133 5.624659
2007-01-02 5.897433 0.000000 5.223307 0.000000 0.000000 0.000000 1.226588 6.059783 4.857796 3.152276 5.897433 0.000000 6.459333 6.172176 1.716100 3.847845
2007-01-03 4.916866 3.685891 6.887390 4.475606 4.829863 6.019907 4.963375 3.029891 2.069551 6.183581 4.916866 3.679719 6.459333 3.086088 2.565066 7.401474
2007-01-04 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2007-01-05 0.000000 3.685891 3.027674 7.199633 6.223503 6.019907 4.963375 0.737225 2.428898 3.359381 0.000000 3.679719 3.229666 3.086088 5.130133 2.812330
Table 7: Inverse Distance Weighting of Nearest Neighbours CHIRPS (n=3)
hk_P0406 hk_P0424 hk_164 hk_174P183 hk_P0407 hk_P0408 hk_P0409 hk_P04101560a hk_P0415 hk_P0416 hk_P0419 hk_P0420 hk_P0421 hk_P0422 hk_P0423 hk_196a
date
2007-01-01 3.316454 3.711606 4.034970 0.000000 0.000000 0.365381 0.000000 3.231315 2.253747 5.607171 3.599759 5.79907 4.293362 3.256142 5.145240 5.423713
2007-01-02 6.461699 0.000000 6.425526 0.000000 0.000000 0.000000 2.113001 6.462630 4.507495 2.645160 6.306923 0.00000 6.512120 6.512284 2.236372 3.036226
2007-01-03 3.487663 3.711606 6.898074 4.419014 5.006313 7.075562 4.430067 3.231315 2.174845 6.435251 4.492354 3.48122 6.512120 3.256142 2.572620 7.811200
2007-01-04 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000
2007-01-05 0.000000 3.711606 3.330900 7.591432 5.339226 7.075562 4.430067 0.097714 2.253747 3.135316 0.000000 3.48122 3.256060 3.256142 5.145240 2.711856

Comparing these tables with Table 4 (original BBWS Citarum data) and Table 5 (single nearest grid cell), we can start to assess how these averaging methods might affect the agreement between CHIRPS and BBWS Citarum data. A more detailed quantitative comparison will be performed in Chapter 3: Exploratory Data Analysis. In the next section, we will explore another method for aggregating CHIRPS data based on Thiessen polygons.

2.3.3.4 Thiessen Polygon Weighted Average

In the previous sections, we explored methods for extracting CHIRPS data based on proximity to BBWS Citarum stations. Now, we’ll introduce an approach that leverages the concept of Thiessen polygons, which we previously used to calculate a weighted average for the BBWS Citarum data in previous section. In this section, we’ll apply Thiessen polygons to aggregate CHIRPS data, providing a way to compare the two datasets based on areas of influence rather than just proximity. This method will result in a single daily rainfall value for the entire watershed, incorporating the spatial weights derived from the Thiessen polygons.

Concept

Recall that Thiessen polygons define areas around each BBWS Citarum station where every point within a polygon is closer to that station than to any other station. We can use these same polygons to aggregate CHIRPS data, effectively assigning portions of the CHIRPS grid to each station based on these areas of influence.

In this approach, we will first calculate a simple average of the rainfall values from all CHIRPS grid cells falling within each polygon. Then, we’ll weight these polygon averages by their respective areas (normalized to sum to 1) to obtain a single, watershed-wide weighted average rainfall value for each day.

Methodology

Here’s how we implement this simplified method:

  1. Overlay Thiessen Polygons and CHIRPS Grid: We overlay the Thiessen polygons (created for the BBWS Citarum stations) onto the CHIRPS grid. This allows us to identify which CHIRPS grid cells fall within each polygon.
  2. Calculate Simple Average for Each Polygon: For each polygon, we calculate the arithmetic mean of the rainfall values from all CHIRPS grid cells that fall (partially or fully) within that polygon for each day. This gives us a representative rainfall value for each polygon for each day.
  3. Calculate Watershed-Wide Average: We weight the polygon averages calculated in step 2 by their respective areas (normalized to sum to 1, as used in Section 2.3.2.3) and sum them to get a watershed-wide average for each day.
  4. Repeat for All Days: We repeat steps 2 and 3 for all days in our time series.

Mathematical Representation

The Thiessen polygon weighted average CHIRPS rainfall for the entire watershed on day d can be represented as:

\[ \text{Weighted Average Rainfall}_{d} = \sum_{p=1}^{P} \left( \text{Polygon Weight}_p * \text{Simple Average Rainfall}_{p,d} \right) \]

Where:

\[ \text{Simple Average Rainfall}_{p,d} = \frac{1}{n} \sum_{i=1}^{n} R_{i,d} \]

  • \(P\) is the total number of Thiessen polygons (equal to the number of BBWS Citarum stations).
  • \(\text{Polygon Weight}_p\) is the normalized area weight of polygon p (as used in Section 2.3.2.3).
  • \(\text{Simple Average Rainfall}_{p,d}\) is the simple average of CHIRPS rainfall within polygon p on day d.
  • \(n\) is the number of CHIRPS grid cells within polygon p.
  • \(R_{i,d}\) is the rainfall value from the i-th CHIRPS grid cell within polygon p on day d.

Implementation

The implementation of this simplified method involves spatial operations to determine which CHIRPS grid cells fall within each Thiessen polygon, calculating the simple average for each polygon, and then performing the weighted average to obtain a single watershed-wide value for each day. We store the final result in a pandas DataFrame called chirps_thiessen_weighted_avg.

Figure 14: Interactive Map of Nearest CHIRPS to BBWS Stations within Thiessen Polygons

From an observation of Figure 14, we can see that the Thiessen polygons effectively divide the watershed into regions associated with each BBWS Citarum station. The distribution of CHIRPS grid points within these polygons varies, with some polygons containing more points than others. This is expected, given the fixed grid structure of CHIRPS and the irregular shapes of the Thiessen polygons. While this approach provides a reasonable way to incorporate spatial information into our CHIRPS data aggregation, it’s essential to remember the simplification we’ve made by treating CHIRPS grid cells as points. This might introduce some inaccuracies, especially at the edges of the polygons. However, for the purpose of comparing overall rainfall patterns between CHIRPS and BBWS Citarum data, this method offers a good balance between accuracy and computational efficiency. A more precise approach might involve calculating the exact overlap area between grid cells and polygons, but that would add significant complexity to our calculations.

To visualize the results of this method, we’ll create a calendar heatmap similar to the one we made for the BBWS Citarum Thiessen weighted average and CHIRPS simple watershed-wide average. Figure 15 shows the Thiessen polygon weighted average CHIRPS rainfall for the entire watershed for each year.

2007-2011

2012-2016

2017-2019
Figure 15: Calmap of CHIRPS Thiessen Polygon Weighted Average Rainfall

This simplified Thiessen polygon-based aggregation provides a valuable method for comparing CHIRPS and BBWS Citarum data, resulting in a single daily value that incorporates spatial weights in a computationally efficient manner. In Chapter 3: Exploratory Data Analysis, we will perform a more detailed quantitative comparison of this method with other aggregation approaches and with the BBWS Citarum data.


2.3.4 Preparation for EDA

Having meticulously processed and aligned our BBWS Citarum and CHIRPS rainfall data, both temporally and spatially, we now stand at the threshold of exploratory data analysis (EDA). This crucial preparatory step involves consolidating our processed datasets, conducting final visual inspections, and acknowledging any limitations inherent in our methodologies. This will set a solid foundation for the in-depth exploration and comparative analysis that awaits us in Chapter 3.

2.3.4.1 Combining the Processed Datasets

Our efforts in the preceding sections have yielded several versions of processed rainfall data. For BBWS Citarum, we have the original station data, a simple watershed-wide average, and a Thiessen polygon weighted average. For CHIRPS, we have a watershed-wide average, nearest grid cell extractions, simple and inverse distance-weighted (IDW) nearest neighbor averages, and a Thiessen polygon weighted average.

To facilitate a streamlined EDA, we will combine these processed datasets into a set of summary DataFrames. Each DataFrame will represent a specific level of spatial aggregation (e.g., station-level, watershed-level) and will contain the relevant rainfall values from both BBWS Citarum and CHIRPS, aligned by date.

Here’s a glimpse of our combined datasets, showcasing the first five days:

Station-Level Data:

Table 8 displays the first five days of rainfall data at hk_P0424 and hk_196a for each BBWS Citarum station alongside the corresponding CHIRPS data extracted using the nearest grid cell, 3-nearest neighbor simple average, and 3-nearest neighbor IDW average methods.

Table 8: The first 5 rows of station-level data for stations hk_P0420 and hk_196a
bbws_data_original chirps_data_nearest chirps_data_3nn chirps_data_3nn_idw
hk_P0420 hk_196a hk_P0420 hk_196a hk_P0420 hk_196a hk_P0420 hk_196a
date
2007-01-01 2.0 1.0 6.574951 5.340987 6.142766 5.624659 5.79907 5.423713
2007-01-02 0.0 4.0 0.000000 2.670493 0.000000 3.847845 0.00000 3.036226
2007-01-03 0.0 0.0 3.287476 8.011481 3.679719 7.401474 3.48122 7.811200
2007-01-04 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000
2007-01-05 0.0 0.0 3.287476 2.670493 3.679719 2.812330 3.48122 2.711856

Watershed-Level Data:

Table 9 presents the first five days of watershed-averaged rainfall data. It includes the simple and Thiessen polygon weighted averages for both BBWS Citarum and CHIRPS.

Table 9: First 5 rows of watershed-level rainfall observations
bbws_avg chirps_avg bbws_thiessen_avg chirps_thiessen_avg
2007-01-01 2.62500 3.210879 4.620090 3.145399
2007-01-02 1.72500 2.081615 2.494786 2.037377
2007-01-03 0.70000 4.336284 1.362465 4.269931
2007-01-04 0.03125 0.000000 0.009601 0.000000
2007-01-05 0.28750 3.942114 0.409467 4.168351
2.3.4.2 Visual Inspection and Observations

Before delving into quantitative analysis, it’s beneficial to visually compare the different rainfall representations we’ve created. We will focus on comparing the watershed-level aggregations using calendar heatmaps, as they provide a clear overview of temporal patterns.

Figure 16 Figure 17 Figure 18 display calendar heatmaps for the years 2007, 2010, 2015, and 2019 respectively, comparing the four watershed-level rainfall representations:

  1. BBWS Citarum Simple Average: Simple arithmetic mean of all 16 stations.
  2. BBWS Citarum Thiessen Weighted Average: Weighted average based on Thiessen polygon areas.
  3. CHIRPS Simple Average: Simple arithmetic mean of all grid cells within the watershed.
  4. CHIRPS Thiessen Weighted Average: Weighted average based on Thiessen polygon areas, using a simplified grid cell as point approach.

These visualizations allow for a quick, albeit qualitative, assessment of the agreement between the different methods and datasets. It’s crucial to remember that each year’s calendar heatmap has its own independent color scale. The colors represent the relative distribution of daily rainfall within that particular year, not across all years. Therefore, a dark square in one year does not necessarily indicate the same absolute rainfall amount as a dark square in another year.

BBWS Citarum Simple Average

CHIRPS Simple Average

BBWS Citarum Thiessen Weighted Average

CHIRPS Thiessen Weighted Average
Figure 16: Comparison of Watershed-Level Rainfall Averages (2007-2011)

BBWS Citarum Simple Average

CHIRPS Simple Average

BBWS Citarum Thiessen Weighted Average

CHIRPS Thiessen Weighted Average
Figure 17: Comparison of Watershed-Level Rainfall Averages (2012-2016)

BBWS Citarum Simple Average

CHIRPS Simple Average

BBWS Citarum Thiessen Weighted Average

CHIRPS Thiessen Weighted Average
Figure 18: Comparison of Watershed-Level Rainfall Averages (2017-2019)

Observations:

  • General Seasonal Pattern Agreement: Across all 13 years and all methods, the general seasonal patterns are remarkably consistent. Wet periods (darker colors) are consistently observed at the beginning and end of each year, typically spanning from October to April. Drier periods (lighter colors) are concentrated in the middle months, roughly from June to September. This strong agreement confirms the dominant wet and dry season cycle in the Upper Citarum River Watershed, regardless of the data source or averaging method.
  • CHIRPS vs. BBWS Citarum:
    • Smoothing: The CHIRPS-based averages (both simple and Thiessen) generally exhibit smoother transitions between wet and dry periods compared to the BBWS Citarum averages. This is likely attributed to the spatial smoothing inherent in the CHIRPS data, which averages rainfall over larger grid cells. The BBWS Citarum averages, especially the Thiessen weighted average, tend to show more abrupt changes, potentially capturing localized rainfall events more distinctly.
    • Within-Year Rainfall Distribution: We can compare how each method distributes rainfall within each year. For example, in some years, CHIRPS shows a more even distribution of rainfall throughout the wet season, while BBWS Citarum might show more distinct peaks. This suggests differences in how each dataset captures the intensity and variability of rainfall within a year.
  • Thiessen vs. Simple Average:
    • Enhanced Extremes (BBWS Citarum): The Thiessen weighted average for BBWS Citarum often displays slightly more pronounced extremes compared to its simple average counterpart. Within a given year, the Thiessen method tends to have more very dark and very light squares, indicating its sensitivity to localized heavy rainfall or dry spells captured by stations with larger Thiessen weights.
    • Subtle Differences (CHIRPS): The differences between the Thiessen weighted average and the simple average for CHIRPS are generally more subtle. However, within some years, the Thiessen method does show a slightly wider range of colors, suggesting that the weighting does have some impact on the representation of rainfall variability.
  • Inter-annual Variability: While we cannot directly compare rainfall amounts between years using these heatmaps, we can observe variations in the distribution of rainfall within each year. For instance, some years have more days with intense rainfall (darker squares) spread throughout the wet season, while others have rainfall concentrated in fewer, more intense events.
  • Consistency Across Methods: Despite the differences highlighted above, it’s important to note that all four methods generally agree on the overall characterization of the rainfall pattern within each year. If one method shows a year with more frequent heavy rainfall events, the other methods tend to show a similar pattern for that same year.
2.3.4.3 Limitations and Considerations

While our data processing has been thorough, it’s important to acknowledge certain limitations:

  • Simplified Thiessen Polygon Approach for CHIRPS: In our calculation of the Thiessen weighted average for CHIRPS, we treated grid cells as points. A more accurate approach would involve calculating the exact overlap between each grid cell and the Thiessen polygons. However, this would add significant computational complexity. We opted for the simplified method for efficiency, recognizing that it might introduce minor inaccuracies.
  • Spatial Resolution Differences: CHIRPS data has an approximately 5km resolution, while BBWS Citarum data represents point measurements. This inherent difference in spatial scale can lead to discrepancies, especially in areas with high rainfall variability.

3 Exploratory Data Analysis (EDA)

In this chapter, we embark on a journey to understand the intricacies of rainfall patterns within the Upper Citarum River Watershed, as depicted by our two primary data sources: the satellite-derived CHIRPS dataset and the ground-based measurements from BBWS Citarum rain gauges. Through exploratory data analysis (EDA), we aim to uncover the fundamental characteristics of each dataset, compare their statistical properties, and identify any significant discrepancies that might exist. This process will involve a combination of descriptive statistics, time series visualizations, and spatial analysis, laying the groundwork for a more in-depth comparative assessment in Chapter 4.

3.1 Descriptive Statistics

Descriptive statistics provide a concise summary of the main features of a dataset. By calculating and comparing key statistical measures, we can gain valuable insights into the central tendency, variability, and distribution of rainfall as captured by CHIRPS and BBWS Citarum. This section will explore these statistics at two levels: the watershed level, where we consider aggregated rainfall values, and the individual station level, where we analyze rainfall at specific locations.

3.1.1 Watershed-Level Statistics

Here, we examine the overall statistical properties of rainfall across the entire Upper Citarum River Watershed. We’ll calculate and compare descriptive statistics for each of the four watershed-level rainfall representations we created in Chapter 2:

  1. BBWS Citarum Simple Average: The arithmetic mean of daily rainfall from all 16 BBWS Citarum stations.
  2. BBWS Citarum Thiessen Weighted Average: A weighted average of daily rainfall, where each station’s contribution is weighted by its corresponding Thiessen polygon’s area.
  3. CHIRPS Simple Average: The arithmetic mean of daily rainfall from all CHIRPS grid cells within the watershed.
  4. CHIRPS Thiessen Weighted Average: A weighted average of daily CHIRPS rainfall, where each grid cell is weighted based on the area of the Thiessen polygon it falls within (using the simplified grid-cell-as-point approach).

For each of these representations, we will calculate the following descriptive statistics:

  • Mean: The average daily rainfall over the entire study period (2007-2019).
  • Median: The middle value when daily rainfall amounts are sorted.
  • Standard Deviation: A measure of the spread or variability of daily rainfall.
  • Minimum and Maximum: The lowest and highest daily rainfall values recorded during the study period.
  • Percentiles (25th, 50th, 75th, 90th, 95th, 99th): These values indicate the daily rainfall amount below which a certain percentage of the data falls. For example, the 95th percentile represents the daily rainfall amount that is exceeded only 5% of the time.

The table below (Table 10) presents the calculated descriptive statistics for each of the four watershed-level rainfall representations:

Table 10: Summary statistics of the watershed-level rainfall observations
bbws_avg chirps_avg bbws_thiessen_avg chirps_thiessen_avg
mean 5.771364 7.963715 5.711824 8.015833
std 6.747801 9.381002 6.703199 9.441311
min 0.000000 0.000000 0.000000 0.000000
25% 0.437500 0.000000 0.341505 0.000000
50% 3.328125 5.230705 3.235180 5.253154
75% 8.845313 12.908255 8.792316 13.019210
90% 15.337500 20.278960 15.076780 20.413450
95% 19.870625 25.654488 19.764056 25.987347
max 47.462500 80.774963 49.610116 79.715576

Table: Table 10 Descriptive statistics for watershed-level rainfall representations (daily data, 2007-2019). All values are in millimeters (mm).

Several key observations can be drawn from these statistics:

  1. CHIRPS Overestimation: The higher mean and median daily rainfall values in both CHIRPS-based representations (Simple and Thiessen Weighted) compared to the BBWS Citarum averages indicate that CHIRPS tends to overestimate daily rainfall amounts across the watershed, when compared to the ground truth. The CHIRPS mean is about 38% higher, and the median is about 58% higher than the BBWS Citarum measurements.

  2. CHIRPS Overestimation of Variability: The notably higher standard deviation in CHIRPS suggests that it overestimates the variability of daily rainfall. This, combined with the much higher maximum values in CHIRPS, suggests that it tends to overestimate the intensity of high rainfall events, when compared to the ground truth, at least at the watershed level.

  3. CHIRPS Underestimation of Low Rainfall Events: The 25th percentile values indicate that on 25% of the days, BBWS Citarum recorded at least some rainfall (even if small) across the watershed, while CHIRPS estimated no rainfall at all on those days. This suggests that CHIRPS may underestimate the frequency of low rainfall events or miss them altogether.

  4. CHIRPS and BBWS Citarum Closer Agreement on Larger Events: While CHIRPS consistently overestimates rainfall in the upper percentiles (75th, 90th, 95th), the ratios between CHIRPS and BBWS Citarum are smaller in these higher percentiles (e.g., 95th percentile: ~30%) compared to lower percentiles (e.g., median: ~58%). This might indicate that CHIRPS and BBWS Citarum are in closer agreement when it comes to detecting and quantifying larger rainfall events, although CHIRPS still tends to overestimate their magnitude.

  5. Limited Impact of Thiessen Weighting at Watershed Level: The relatively minor differences between the simple average and the Thiessen weighted average for both BBWS Citarum and CHIRPS suggest that, at the watershed level, the weighting based on Thiessen polygon areas does not significantly alter the overall statistical properties compared to a simple average.

These findings reveal significant discrepancies between the satellite-derived CHIRPS data and the ground-based BBWS Citarum measurements at the watershed level. CHIRPS tends to overestimate rainfall, particularly for moderate events, and overestimates the overall variability of daily rainfall. It also appears to underestimate the frequency of low rainfall events, when compared to the ground truth. While the agreement between CHIRPS and BBWS Citarum seems better for larger rainfall events, CHIRPS still tends to overestimate their magnitude. These discrepancies could have implications for hydrological modeling, water resource management, and the interpretation of rainfall patterns in the Upper Citarum River Watershed. For instance, using CHIRPS data directly without correction could lead to overestimations of water availability, potentially affecting irrigation planning or flood risk assessments.

However, it is important to note that while these watershed-level discrepancies are significant, they do not necessarily negate the potential value of CHIRPS data. CHIRPS, with its high spatial resolution and continuous coverage, offers a unique opportunity to understand rainfall patterns in areas where ground-based measurements are sparse or unavailable. By delving deeper into the station-level analysis, we can identify locations or conditions where CHIRPS performs relatively well, and potentially develop correction methods to improve its accuracy. This more granular understanding can help us harness the strengths of CHIRPS while mitigating its weaknesses, ultimately enabling us to utilize this valuable data source for improved water resource management, particularly in regions with limited ground-based monitoring.

Footnotes

  1. https://www.chc.ucsb.edu/data/chirps↩︎

Reuse

Citation

BibTeX citation:
@online{megariansyah2024,
  author = {Megariansyah, Taruma Sakti},
  title = {Evaluating {CHIRPS} with {Local} {Rainfall} {Data}},
  date = {2024-12-27},
  url = {https://dev.taruma.info/rf-comp-id/notebook_en.html},
  langid = {en},
  abstract = {WIP. \{\{\textless{} lipsum 1 \textgreater\}\}}
}
For attribution, please cite this work as:
Megariansyah, Taruma Sakti. 2024. “Evaluating CHIRPS with Local Rainfall Data.” December 27, 2024. https://dev.taruma.info/rf-comp-id/notebook_en.html.